{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# CTW dataset tutorial (Part 1: basics)\n",
    "\n",
    "Hello, welcome to the tutorial of _Chinese Text in the Wild_ (CTW) dataset. In this tutorial, we will show you:\n",
    "\n",
    "1. [Basics](#CTW-dataset-tutorial-(Part-1:-Basics)\n",
    "\n",
    "  - [The structure of this repository](#The-structure-of-this-repository)\n",
    "  - [Dataset split](#Dataset-Split)\n",
    "  - [Download images and annotations](#Download-images-and-annotations)\n",
    "  - [Annotation format](#Annotation-format)\n",
    "  - [Draw annotations on images](#Draw-annotations-on-images)\n",
    "  - [Appendix: Adjusted bounding box conversion](#Appendix:-Adjusted-bounding-box-conversion)\n",
    "\n",
    "2. Classification baseline\n",
    "\n",
    "  - Train classification model\n",
    "  - Results format and evaluation API\n",
    "  - Evaluate your classification model\n",
    "\n",
    "3. Detection baseline\n",
    "\n",
    "  - Train detection model\n",
    "  - Results format and evaluation API\n",
    "  - Evaluate your classification model\n",
    "\n",
    "Our homepage is https://ctwdataset.github.io/, you may find some more useful information from that.\n",
    "\n",
    "If you don't want to run the baseline code, please jump to [Dataset split](#Dataset-Split) and [Annotation format](#Annotation-format) sections.\n",
    "\n",
    "Notes:\n",
    "  > This notebook MUST be run under `$CTW_ROOT/tutorial`.\n",
    "  >\n",
    "  > All the code SHOULD be run with `Python>=3.4`. We make it compatible with `Python>=2.7` with best effort.\n",
    "  >\n",
    "  > The key words \"MUST\", \"MUST NOT\", \"REQUIRED\", \"SHALL\", \"SHALL NOT\", \"SHOULD\", \"SHOULD NOT\", \"RECOMMENDED\", \"MAY\", and \"OPTIONAL\" in this document are to be interpreted as described in [RFC 2119](https://tools.ietf.org/html/rfc2119)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The structure of this repository\n",
    "\n",
    "Our git repository is `git@github.com:yuantailing/ctw-baseline.git`, which you can browse from [GitHub](https://github.com/yuantailing/ctw-baseline).\n",
    "\n",
    "There are several directories under `$CTW_ROOT`.\n",
    "\n",
    "  - **tutorial/**: this tutorial\n",
    "  - **data/**: download and place images and annotations\n",
    "  - **prepare/**: prepare dataset splits\n",
    "  - **classification/**: classification baselines using [TensorFlow](https://www.tensorflow.org/)\n",
    "  - **detection/**: a detection baseline using [YOLOv2](https://pjreddie.com/darknet/yolo/)\n",
    "  - **judge/**: evaluate testing results and draw results and statistics\n",
    "  - **pythonapi/**: APIs to traverse annotations, to evaluate results, and for common use\n",
    "  - **cppapi/**: a faster implementation to detection AP evaluation\n",
    "  - **codalab/**: which we run on [CodaLab](https://competitions.codalab.org/competitions/?q=CTW) (our evaluation server)\n",
    "  - **ssd/**: a detection method using [SSD](https://github.com/weiliu89/caffe/tree/ssd)\n",
    "\n",
    "Most of the above directories have some similar structures.\n",
    "\n",
    "  - **\\*/settings.py**: configure directory of images, file path to annotations, and dedicated configurations for each step\n",
    "  - **\\*/products/**: store temporary files, logs, middle products, and final products \n",
    "  - **\\*/pythonapi**: a symbolic link to `pythonapi/`, in order to use Python API more conveniently\n",
    "\n",
    "Most of the code is written in Python, while some code is written in C++, Shell, etc.\n",
    "\n",
    "All the code is purposed to run in subdirectories, e.g., it's correct to execute `cd $CTW_ROOT/detection && python3 train.py`, and it's incorrect to execute `cd $CTW_ROOT && python3 detection/train.py`.\n",
    "\n",
    "All our code won't create or modify any files out of `$CTW_ROOT` (except `/tmp/`), and don't need a privilege elevation (except for running docker workers on the evaluation server). You SHOULD install requirements before you run our code.\n",
    "\n",
    "  - git>=1\n",
    "  - Python>=3.4\n",
    "  - Jupyter notebook>=5.0\n",
    "  - gcc>=5\n",
    "  - g++>=5\n",
    "  - CUDA driver\n",
    "  - CUDA toolkit>=8.0\n",
    "  - CUDNN>=6.0\n",
    "  - OpenCV>=3.0\n",
    "  - requirements listed in `$CTW_ROOT/requirements.txt`\n",
    "\n",
    "Recommonded hardware requirements:\n",
    "\n",
    "  - RAM >= 32GB\n",
    "  - GPU memory >= 12 GB\n",
    "  - Hard Disk free space >= 200 GB\n",
    "  - CPU logical cores >= 8\n",
    "  - Network connection"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dataset Split\n",
    "\n",
    "We split the dataset into 4 parts:\n",
    "\n",
    "1. Training set (~75%)\n",
    "\n",
    "  For each image in training set, the annotation contains a lot of lines, while each lines contains some character instances.\n",
    "  \n",
    "  Each character instance contains:\n",
    "  \n",
    "    - its underlying character,\n",
    "    - its bounding box (polygon),\n",
    "    - and 6 attributes.\n",
    "\n",
    "  Only Chinese character instances are completely annotated, non-Chinese characters (e.g., ASCII characters) are partially annotated.\n",
    "\n",
    "  Some ignore regions are annotated, which contain character instances that cannot be recognized by human (e.g., too small, too fuzzy).\n",
    "\n",
    "  We will show the annotation format in [next sections](#Annotation-format).\n",
    "\n",
    "2. Validation set (~5%)\n",
    "\n",
    "  Annotations in validation set is the same as that in training set.\n",
    "  \n",
    "  The split between training set and validation set is only a recommendation. We make no restriction on how you split them. To enlarge training data, you MAY use TRAIN+VAL to train your models.\n",
    "\n",
    "3. Testing set for classification (~10%)\n",
    "\n",
    "  For this testing set, we make images and annotated bounding boxes publicly available. Underlying character, attributes and ignored regions are not avaliable.\n",
    "\n",
    "  To evaluate your results on testing set, please visit our evaluation server.\n",
    "\n",
    "4. Testing set for detection (~10%)\n",
    "\n",
    "  For this testing set, we make images public.\n",
    "\n",
    "  To evaluate your results on testing set, please visit our evaluation server.\n",
    "\n",
    "Notes:\n",
    "  \n",
    "  > You MUST NOT use annotations of testing set to fine tune your models or hyper-parameters. (e.g. use annotations of classification testing set to fine tune your detection models)\n",
    "  >\n",
    "  > You MUST NOT use evaluation server to fine tune your models or hyper-parameters."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Download images and annotations\n",
    "\n",
    "Visit our homepage (https://ctwdataset.github.io/) and gain access to the dataset.\n",
    "\n",
    "1. Clone our git repository.\n",
    "\n",
    "  ```sh\n",
    "$ git clone git@github.com:yuantailing/ctw-baseline.git\n",
    "```\n",
    "\n",
    "1. Download images, and unzip all the images to `$CTW_ROOT/data/all_images/`.\n",
    "\n",
    "  For image file path, both `$CTW_ROOT/data/all_images/0000001.jpg` and `$CTW_ROOT/data/all_images/any/path/0000001.jpg` are OK, do not modify file name.\n",
    "\n",
    "1. Download annotations, and unzip it to `$CTW_ROOT/data/annotations/downloads/`.\n",
    "\n",
    "  ```sh\n",
    "$ mkdir -p ../data/annotations/downloads && tar -xzf /path/to/ctw-annotations.tar.gz -C../data/annotations/downloads\n",
    "```\n",
    "\n",
    "1. In order to run evaluation and analysis code locally, we will use validation set as testing sets in this tutorial.\n",
    "\n",
    "  ```sh\n",
    "$ cd ../prepare && python3 fake_testing_set.py\n",
    "```\n",
    "\n",
    "  If you propose to train your model on TRAIN+VAL, you can execute `cp ../data/annotations/downloads/* ../data/annotations/` instead of running the above code. But you will not be able to run evaluation and analysis code locally, just submit the results to our evaluation server.\n",
    "\n",
    "1. Create symbolic links for TRAIN+VAL (`$CTW_ROOT/data/images/trainval/`) and TEST(`$CTW_ROOT/data/images/test/`) set, respectively.\n",
    "\n",
    "  ```sh\n",
    "$ cd ../prepare && python3 symlink_images.py\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Annotation format\n",
    "\n",
    "In this section, we will show you:\n",
    "\n",
    "- Overall information format\n",
    "- Training set annotation format\n",
    "- Classification testing set format\n",
    "\n",
    "We will display some examples in the next section.\n",
    "\n",
    "#### Overall information format\n",
    "\n",
    "Overall information file (`../data/annotations/info.json`) is UTF-8 (no BOM) encoded [JSON](https://www.json.org/).\n",
    "\n",
    "The data struct for this information file is described below.\n",
    "\n",
    "```\n",
    "information:\n",
    "{\n",
    "    train: [image_meta_0, image_meta_1, image_meta_2, ...],\n",
    "    val: [image_meta_0, image_meta_1, image_meta_2, ...],\n",
    "    test_cls: [image_meta_0, image_meta_1, image_meta_2, ...],\n",
    "    test_det: [image_meta_0, image_meta_1, image_meta_2, ...],\n",
    "}\n",
    "\n",
    "image_meta:\n",
    "{\n",
    "    image_id: str,\n",
    "    file_name: str,\n",
    "    width: int,\n",
    "    height: int,\n",
    "}\n",
    "```\n",
    "`train`, `val`, `test_cls`, `test_det` keys denote to training set, validation set, testing set for classification, testing set for detection, respectively.\n",
    "\n",
    "The resolution of each image is always $2048 \\times 2048$. Image ID is a 7-digits string, the first digit of image ID indicates the camera orientation in the following rule.\n",
    "\n",
    "  - '0': back\n",
    "  - '1': left\n",
    "  - '2': front\n",
    "  - '3': right\n",
    "\n",
    "The `file_name` filed doesn't contain directory name, and is always `image_id + '.jpg'`.\n",
    "\n",
    "#### Training set annotation format\n",
    "\n",
    "All `.jsonl` annotation files (e.g. `../data/annotations/train.jsonl`) are UTF-8 encoded [JSON Lines](http://jsonlines.org/), each line is corresponding to the annotation of one image.\n",
    "\n",
    "The data struct for each of the annotations in training set (and validation set) is described below.\n",
    "```\n",
    "annotation (corresponding to one line in .jsonl):\n",
    "{\n",
    "    image_id: str,\n",
    "    file_name: str,\n",
    "    width: int,\n",
    "    height: int,\n",
    "    annotations: [sentence_0, sentence_1, sentence_2, ...],    # MUST NOT be empty\n",
    "    ignore: [ignore_0, ignore_1, ignore_2, ...],               # MAY be an empty list\n",
    "}\n",
    "\n",
    "sentence:\n",
    "[instance_0, instance_1, instance_2, ...]                 # MUST NOT be empty\n",
    "\n",
    "instance:\n",
    "{\n",
    "    polygon: [[x0, y0], [x1, y1], [x2, y2], [x3, y3]],    # x, y are floating-point numbers\n",
    "    text: str,                                            # the length of the text MUST be exactly 1\n",
    "    is_chinese: bool,\n",
    "    attributes: [attr_0, attr_1, attr_2, ...],            # MAY be an empty list\n",
    "    adjusted_bbox: [xmin, ymin, w, h],                    # x, y, w, h are floating-point numbers\n",
    "}\n",
    "\n",
    "attr:\n",
    "\"occluded\" | \"bgcomplex\" | \"distorted\" | \"raised\" | \"wordart\" | \"handwritten\"\n",
    "\n",
    "ignore:\n",
    "{\n",
    "    polygon: [[x0, y0], [x1, y1], [x2, y2], [x3, y3]],\n",
    "    bbox: [xmin, ymin, w, h],\n",
    "]\n",
    "```\n",
    "\n",
    "Original bounding box annotations are polygons, we will describe how `polygon` is converted to `adjusted_bbox` in [appendix](#Appendix:-Adjusted-bounding-box-conversion).\n",
    "\n",
    "Notes:\n",
    "\n",
    "  > The order of lines are not guaranteed to be consistent with `info.json`.\n",
    "  >\n",
    "  > A polygon MUST be a quadrangle.\n",
    "  >\n",
    "  > All characters in `CJK Unified Ideographs` are considered to be Chinese, while characters in `ASCII` and `CJK Unified Ideographs Extension`(s) are not.\n",
    "  >\n",
    "  > Adjusted bboxes of character `instance`s MUST be intersected with the image, while bboxes of `ignore` regions may not.\n",
    "  >\n",
    "  > Some logos on the camera car (e.g., \"`腾讯街景地图`\" in `2040368.jpg`) and licence plates are ignored to avoid bias.\n",
    "\n",
    "#### Classification testing set format\n",
    "\n",
    "The data struct for each of the annotations in classification testing set is described below.\n",
    "\n",
    "```\n",
    "annotation:\n",
    "{\n",
    "    image_id: str,\n",
    "    file_name: str,\n",
    "    width: int,\n",
    "    height: int,\n",
    "    proposals: [proposal_0, proposal_1, proposal_2, ...],\n",
    "}\n",
    "\n",
    "proposal:\n",
    "{\n",
    "    polygon: [[x0, y0], [x1, y1], [x2, y2], [x3, y3]],\n",
    "    adjusted_bbox: [xmin, ymin, w, h],\n",
    "}\n",
    "```\n",
    "\n",
    "Notes:\n",
    "\n",
    "  > The order of `image_id` in each line are not guaranteed to be consistent with `info.json`.\n",
    "  >\n",
    "  > Non-Chinese characters (e.g., ASCII characters) MUST NOT appear in proposals."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "from __future__ import print_function\n",
    "from __future__ import unicode_literals\n",
    "\n",
    "import json\n",
    "import pprint\n",
    "import settings\n",
    "\n",
    "from pythonapi import anno_tools\n",
    "\n",
    "print('Image meta info format:')\n",
    "with open(settings.DATA_LIST) as f:\n",
    "    data_list = json.load(f)\n",
    "pprint.pprint(data_list['train'][0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Training set annotation format:')\n",
    "with open(settings.TRAIN) as f:\n",
    "    anno = json.loads(f.readline())\n",
    "pprint.pprint(anno, depth=3)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Character instance format:')\n",
    "pprint.pprint(anno['annotations'][0][0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Traverse character instances in an image')\n",
    "for instance in anno_tools.each_char(anno):\n",
    "    print(instance['text'], end=' ')\n",
    "print()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "print('Classification testing set format')\n",
    "with open(settings.TEST_CLASSIFICATION) as f:\n",
    "    anno = json.loads(f.readline())\n",
    "pprint.pprint(anno, depth=2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print('Classification testing set proposal format')\n",
    "pprint.pprint(anno['proposals'][0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Draw annotations on images\n",
    "\n",
    "In this section, we will draw annotations on images. This would help you to understand the format of annotations.\n",
    "\n",
    "We show polygon bounding boxes of Chinese character instances in **<span style=\"color: #0f0;\">green</span>**, non-Chinese character instances in **<span style=\"color: #f00;\">red</span>**, and ignore regions in **<span style=\"color: #ff0;\">yellow</span>**."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import cv2\n",
    "import json\n",
    "import matplotlib.patches as patches\n",
    "import matplotlib.pyplot as plt\n",
    "import os\n",
    "import settings\n",
    "\n",
    "from pythonapi import anno_tools\n",
    "\n",
    "%matplotlib inline\n",
    "\n",
    "with open(settings.TRAIN) as f:\n",
    "    anno = json.loads(f.readline())\n",
    "path = os.path.join(settings.TRAINVAL_IMAGE_DIR, anno['file_name'])\n",
    "assert os.path.exists(path), 'file not exists: {}'.format(path)\n",
    "img = cv2.imread(path)\n",
    "img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)\n",
    "\n",
    "plt.figure(figsize=(16, 16))\n",
    "ax = plt.gca()\n",
    "plt.imshow(img)\n",
    "for instance in anno_tools.each_char(anno):\n",
    "    color = (0, 1, 0) if instance['is_chinese'] else (1, 0, 0)\n",
    "    ax.add_patch(patches.Polygon(instance['polygon'], fill=False, color=color))\n",
    "for ignore in anno['ignore']:\n",
    "    color = (1, 1, 0)\n",
    "    ax.add_patch(patches.Polygon(ignore['polygon'], fill=False, color=color))\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Appendix: Adjusted bounding box conversion\n",
    "\n",
    "In order to create a tighter bounding box to character instances, we compute `adjusted_bbox` in following steps, instead of use the real bounding box.\n",
    "\n",
    "  1. Take trisections for each edge of the polygon. (<span style=\"color: #f00;\">red points</span>)\n",
    "  2. Compute the bouding box of above points. (<span style=\"color: #00f;\">blue rectangles</span>)\n",
    "\n",
    "Adjusted bounding box is better than the real bounding box, especially for sharp polygons."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from __future__ import division\n",
    "\n",
    "import collections\n",
    "import matplotlib.patches as patches\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "%matplotlib inline\n",
    "\n",
    "def poly2bbox(poly):\n",
    "    key_points = list()\n",
    "    rotated = collections.deque(poly)\n",
    "    rotated.rotate(1)\n",
    "    for (x0, y0), (x1, y1) in zip(poly, rotated):\n",
    "        for ratio in (1/3, 2/3):\n",
    "            key_points.append((x0 * ratio + x1 * (1 - ratio), y0 * ratio + y1 * (1 - ratio)))\n",
    "    x, y = zip(*key_points)\n",
    "    adjusted_bbox = (min(x), min(y), max(x) - min(x), max(y) - min(y))\n",
    "    return key_points, adjusted_bbox\n",
    "\n",
    "polygons = [\n",
    "    [[2, 1], [11, 2], [12, 18], [3, 16]],\n",
    "    [[21, 1], [30, 5], [31, 19], [22, 14]],\n",
    "]\n",
    "\n",
    "plt.figure(figsize=(10, 6))\n",
    "plt.xlim(0, 35)\n",
    "plt.ylim(0, 20)\n",
    "ax = plt.gca()\n",
    "for polygon in polygons:\n",
    "    color = (0, 1, 0)\n",
    "    ax.add_patch(patches.Polygon(polygon, fill=False, color=(0, 1, 0)))\n",
    "    key_points, adjusted_bbox = poly2bbox(polygon)\n",
    "    ax.add_patch(patches.Rectangle(adjusted_bbox[:2], *adjusted_bbox[2:], fill=False, color=(0, 0, 1)))\n",
    "    for kp in key_points:\n",
    "        ax.add_patch(patches.Circle(kp, radius=0.1, fill=True, color=(1, 0, 0)))\n",
    "plt.show()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
